Data Mining in the Clinical Research Environment

نویسنده

  • Dave Smith
چکیده

Data mining has had wide adoption in recent years in many industries, largely because of the ability of mining techniques to rapidly yield answers to business questions in a short time and the availability of large quantities of data to exploit. This paper will discuss the topic of data and text mining in general, before focusing on applications in the clinical research field. Of particular interest is the application of mining techniques to signal detection for adverse events. The value of these techniques is discussed, along with the context in which data and text mining appear in the overall architecture of a SAS solution for pharmacovigilance. WHAT IS DATA MINING? Data mining is defined by SAS as the process of selecting, exploring, and modelling large amounts of data to uncover previously unknown patterns for business advantage. To expand on this in detail, it is important to realise that data mining is a continuous process where models are built, refined and managed over a period of time. The techniques used are largely iterative and empirical in nature, which implies a continuous process. Several different techniques are employed to gain value from the data, including graphical exploration and many different modelling and modification techniques; data mining is not the same as data exploration. Data volumes are generally very large, as data mining techniques are generally applied to circumstances where the problem is not well understood and traditional parametric statistics have either failed or not been applied because of the complexity of the situation. Data mining is also often applied where the problem statement cannot be easily stated, and where a hypothesis needs to be generated. For example the question could be “what significant associations exist between items in a typical shopping basket?” This might then lead to a question such as “do people that buy nappies also buy beer at the same time most of the time?” (This is apparently true!). Data mining should always be done for business advantage, so being able to measure the outcome in business terms and then use that measure to compare models from the data mining process adds value and understanding. THE DATA MINING PROCESS – SEMMA In order to improve the usability of the SAS Enterprise Miner tool and provide a framework to assist users in getting the most out of the tool, SAS has developed the SEMMA process: • Sample the data by creating one or more data tables. The samples should be large enough to contain the significant information, yet small enough to process. You may need to apply stratified sampling techniques to obtain valid analysis of rare events, or not sample the data at all if there is insufficient volume to do so. Many data mining techniques (such as tree models or neural networks) employ learning algorithms and therefore require that the data is divided into two or ideally three parts to allow the algorithms to develop iteratively. • Explore the data by searching for anticipated relationships, unanticipated trends, and anomalies in order to gain understanding and ideas. This is a very important stage in determining the success of the modelling stage; for example a graph of the data might indicate that it should be transformed, or that outliers should be removed. It is also likely to show variables that add no value and can be safely removed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Student Learning Styles using Data Mining Techniques

This paper focuses on the prediction of student learning styles using data mining techniques within their institutions. This prediction was aimed at finding out how different learning styles are achieved within learning environments which are specifically influenced by already existing factors. These learning styles, have been affected by different factors that are mainly engraved and found wit...

متن کامل

Comparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment

In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...

متن کامل

Tire demand planning based on reliability and operating environment

Tires represent a critical spare part in mines. There is a shortage of medium and large tires. In addition, with increased mining activities and the creation of new mines, the demand for tires has increased significantly. Thus, it is particularly important for mining engineers to identify tire characteristics and correctly manage the spare part inventory. Spare parts management is critical from...

متن کامل

Comparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment

In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...

متن کامل

Data Mining in R using Rattle

T‎his paper is a brief introduction to the concepts, methods ‎and ‎algorithms ‎for ‎data ‎mining ‎in ‎statistical ‎software R ‎using a‎ ‎package ‎named ‎Rattle. Rattle ‎provides a‎ ‎good ‎graphical ‎environment ‎to ‎perform ‎some ‎of ‎the ‎procedures ‎and ‎algorithms ‎without ‎the ‎need ‎for ‎programming. ‎Some ‎parts ‎of ‎the ‎package ‎will ‎be ‎explained ‎by a‎ ‎number ‎of ‎examples.‎ ‎ ...

متن کامل

Data Mining: A Novel Outlook to Explore Knowledge in Health and Medical Sciences

Today medical and Healthcare industry generate loads of diverse data about patients, disease diagnosis, prognosis, management, hospitals’ resources, electronic patient health records, medical devices and etc. Using the most efficient processing and analyzing method for knowledge extraction is a key point to cost-saving in clinical decision making. Data mining, sometimes called data or knowledge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012